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Abstract — Vertical Bell Laboratories Layered Space time (V-BLAST) is a Multiple input Multiple output 
(MIMO) wireless communication system that uses multiple antenna elements at transmit and receive to offer 
high spectral efficiency and increased capacity in a rich multipath environment without increasing the use of the 
spectrum. Whereas the detection schemes employed in such systems are computationally expensive as the 
number of transmitting and receiving antennas increases. Reduction in computational cost is essential for time 
critical application of high speed packet transfer using MIMO for 3GPP and 3GPP-2 standards. This paper 
compares the performance between various detection scheme which include the conventional detection, An 
efficient square root algorithm, An improved square root algorithm and An improved square root algorithm 
based on cholesky factorization when subjected to multi-media application with 4x4 and 12x12 array for 
varying SNR. Performance parameters considered include bit error rate (BER), symbol error rate (SER), peak 
signal-to-noise ratio (PSNR), number of number of floating point operations (FLOPS) and Time required for 
detection. Among the Modulation scheme optimum performance is achieved with M-QAM, whereas among the 
detection schemes nonlinear detection schemes with interference cancellation (IC) with MMSE performed better 
than ZF schemes in terms of BER, SER PSNR but at the cost of increase in FLOPS. In an attempt to reduce 
computational complexity the number of FLOPS required for detection with an improved square root algorithm 
based on efficient inverse cholesky factorization with MMSE for 12 transmitting and 12 receiving antennas is 
0.3 x 10 6 , a reduction of 0.7x10 , 0.9xl0 6 , 1.5xl0 6 and 1.7 xlO 6 flops is achieved when compared to improved 
square-root algorithm, the efficient square root algorithm and the conventional detection scheme employing 
Zero forcing (ZF) and MMSE filter respectively. This algorithm is faster than the existing efficient V-BLAST 
algorithms. 

Index Terms — Multiple-input-multiple-output (MIMO) systems, Bell Laboratories Layered Space time 
(BLAST), vertical BLAST (V-BLAST), Zero forcing (ZF), Bit error rate (BER), Symbol error rate (SER), peak 
signal-to-noise ratio (PSNR), Floating point operations (FLOPS). 

I. INTRODUCTION 

Digital communication using Multiple-Input-Multiple-Output (MIMO) wireless systems, characterized 
by multiple antenna elements at the transmitter and receiver, have demonstrated the potential for increased 
capacity in rich multipath environments [l]-[4]. Such systems operate by exploiting the spatial properties of the 
multipath channel, thereby offering a new dimension which can be used to enhance communication 
performance. Bell Labs Layered Space-Time architecture (BLAST) [5], including the relative simple vertical 
BLAST (V-BLAST) [6], is such a system that maximizes the data rate by transmitting independent data streams 
simultaneously from multiple antennas. V-BLAST often adopts the ordered successive interference cancellation 
(OSIC) detector [6], which detects the data streams iteratively with the optimal ordering. In each iteration the 
data stream with the highest signal-to-noise ratio (SNR) among all undetected data streams is detected through 
Zero-forcing (ZF) or minimum mean square error (MMSE) filter. This is referred to as nulling and cancellation. 
The optimal detection order is from the strongest to the weakest signal, since this minimizes propagation of 
error from one step of detection to the next step. Further the effect of the detected data stream is subtracted from 
the received signal vector. This is referred to as interference cancellation. It turns out that the main 
computational bottleneck in the conventional detection algorithm is the step where the optimal ordering for the 
sequential estimation and detection of the transmitted signals, as well as the corresponding so called nulling 
vector is determined. Current implementations devote 90% of the total computational cost to this step. This high 
computational cost limits the scope of the application that admits inexpensive real time solutions. Moreover, 
when the numbers of transmitting and receiving antennas are large repeated pseudo-inverse that conventional 
detection algorithm requires can lead to numerical instability, thus a numerically robust and stable algorithm is 
required. In an attempt to reduce the computational complexity an efficient square -root [7-8] algorithm has 
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been proposed. The algorithm is numerically stable since it is division free and uses only Orthogonal 
transformations such as Householders transformation or sequence of Givens Rotation [9] [10]. To further reduce 
the computational cost An Improved square root 
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Figure 1 : V-BLAST system model 
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algorithm has been proposed [11] which speed up's the original square root algorithm by 45% in terms of 
number addition and multiplication by reusing intermediate results. An Improved Square-Root Algorithm for V- 
BLAST Based on Efficient Inverse Cholesky Factorization [12] computes a triangular square root of the 
estimation error of the covariance matrix using Inverse Cholesky Factorization and is then applied to An 
Improved square root algorithm which can offer further computational savings. The algorithm is faster than the 
existing efficient V-BLAST detection algorithms. 

The remainder of the paper is organized as follows Section II describes the V-BLAST system model Section III 
introduces different V-BLAST detection schemes which include Conventional Detection Algorithm, An 
Efficient Square-Root Algorithm, An Improved Square Root Algorithm and An Improved Square root 
Algorithm based on Efficient Inverse Cholesky Factorization along with their simulation results. Finally we 
make conclusion in Section IV 

In the following sections, (■) , (•)* and () H denote matrix transposition, matrix conjugate, and matrix 
conjugate transposition, respectively. 0m is the M x 1 zero column vector, while Im is the identity matrix of size 
M. 



II. SYSTEM MODEL 

The V-BLAST system consists of M transmitting and N receiving antennas in a rich-scattering 
environment illustrated in Figure. 1 where a single data stream is de -multiplexed into M sub streams and each 
sub stream is then encoded into symbols and fed to its respective transmitter. The Transmitters 1 to M operate 
co-channel at symbol rate 1/ T symbols/sec, with synchronized symbol timing. Each transmitter is itself an 
ordinary QAM transmitter. The collection of transmitters comprises, in effect, a vector-valued transmitter, 
where components of each transmitted M-vector are symbols drawn from a QAM constellation. The power 
launched by each transmitter is proportional to 1/ M so that the total radiated power is constant and independent 
ofM. 

Let the Signal vector transmitted from M antennas is a= [ a!,a 2 , a M ] T with the co-variance E(aa H )= a^. Then 

the received vector (r) is given by 

r = H.a + w, (1) 

Where w is the Nxl zero-mean circular symmetric complex Gaussian (ZMCSCG) noise vector with the zero 

mean and the covariance er^ I N and H= [ hi,h 2 1im]=[ hi Jl2 hjvi ] H i s the NxM complex matrix.h m and h m 

are the m-th column and the n-th row of H, respectively. 
The Linear zero-forcing (ZF) estimate of a is 

a=H + r=(H H H)-'H H r. (2) 

Define a^cr^/cr^The Linear minimum mean square error (MMSE) estimate of a is 

a=(H H H+aI M )"'H H r. (3) 

Let R=(H H H+ (xIm ).Then the estimation error covariance matrix [4] P is given by 

P=R 1 =(H H H+aI M )" 1 (4) 

The Ordered successive Interference Cancellation (OSIC) detection detects M entries of the transmit vector 'a' 
iteratively with the optimal ordering. In each iteration, the entry with the highest SNR among all the undetected 
entries is detected by a linear filter, and then its interference is cancelled from the received signal vector [5], 
Suppose that the entries of 'a' are permuted such that the detected entry is a M ,the M-th entry. Then the 
Interference is cancelled by 

r M1 = r M -h M a M (5) 

r. Then the reduced order problem is 
(6) 

where the deflated channel matrix H M -i= [ hi,h 2 h M -i] and the reduced transmit vector a M -i = [ al,a2,.. a M -i ] T - 

The Linear estimate of aM-i can be deduced from (6). The detection will proceed iteratively until all entries are 
detected. 



where a M is treated as the correctly detected entry and the initial r 

r M_1 = h M -i a M -i+ w 
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III. V-BLAST DETECTION SCHEMES 

The V-BLAST detection schemes are summarized as follows: 

A. Conventional detection Scheme 

a) Compute a linear transform matrix (P) for nulling. The most common criteria for nulling are zero- 
forcing (7) and minimum mean square error (8) for which the corresponding linear transform matrix 
are 

P = H + = (H H H)-'H H (7) 

P=(H H H+aI M )" 1 H H (8) 

Where + denotes the Moore-Penrose pseudo-inverse and H denotes the Hermitian matrix. 

b) Determine the optimal ordering for detection of the transmitted symbol by 

k=argminll(P)jll 2 (9) 

Iterative Detection: 

c) Obtain the k th nulling vector W k by 

W k =(P) k (10) 

Where (P) k is the k th row of P 

d) Using nulling vector W k form decision statistic y k : 

y k =w kri (li) 

Where r is the received symbols which is a column vector 

e) Slice y k to obtain a k 

a k =Q (y k ) (12) 

Where Q (.) denotes the quantization (slicing) operation appropriate to the constellation in use 

f) Interference Cancellation or the Reduced order problem: Assuming that a k = a k , cancel a k from the 

received vector r resulting in modified received vector rl: 

r i+ i= r, - a k (H) k (13) 

Where (H) k denotes the k th column of H 

g) Deflate H denoted by H^ 



H=Hr, 



(14) 



h) Form the linear transform matrix (P) utilizing the deflated H depending upon the criteria for nulling 

chosen, zero-forcing (7) and minimum mean square error (8). 
i) Determine the optimal ordering for detection of the transmitted symbol by 

k=argmin II (P)j II 2 (15) 

j) If i > 1, let i=i-l and go back to step 3 
1. Simulation Results 

The simulation is performed using the following parameters: 

TABLE I : SIMULATION PARAMETERS 



Antenna Configurations 
(Transmitting X Receiving) 


4x4 and 12X12 


Input Image Dimension 


384X256 


SNR (db) 


0to25 


Compression Applied 


None 


Frame Size Assumed 


4 


Channel Characteristics 


Rayleigh Flat Fading 
varying randomly with 
every frame 


Modulation and 
Demodulation applied 


4,16,64,256,1024 
QAM,PAM,PSK 



From Figure 2,3 (a),(b),(c),(d),(e),(f) BER, SER comparison between zero-forcing (ZF) and minimum 
mean square error (MMSE) (In the ascending order 1 st (Lower most) black line, 2nd black line, 3rd black line, 
4th black line, 5th (upper most) black line indicate BER observed employing MMSE detection scheme with 4- 
QAM,PAM,PSK modulation, 16-QAM, PAM, PSK modulation, 64-QAM, PAM ,PSK modulation, 256-QAM, 
PAM ,PSK modulation and 1024-QAM, PAM ,PSK modulation respectively. The 1st (Lower most) blue line, 
2nd blue line, 3rd blue line, 4th blue line, 5th (upper most) blue line indicate BER observed employing ZF with 
4-QAM, PAM,PSK modulation, 16-QAM, PAM,PSK modulation, 64-QAM, PAM,PSK modulation,256-QAM, 
PAM ,PSK modulation and 1024-QAM, PAM, PSK modulation respectively). 

From Figure 2,3 (a),(b),(c),(d),(e),(f) BER, SER comparison between zero-forcing (ZF) and minimum 
mean square error (MMSE) we observe the Bit Error Rate and Symbol Error Rate obtained for MMSE is lower 
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than that of ZF due to the regularization ((xIm) introduced in MMSE, which introduces a bias that leads to a 
much more reliable result than ZF when the matrix is ill-conditioned and when the estimation of the channel is 
noisy. From figures we also observe MMSE outperforms ZF only when the Modulation scheme employed has 
lower constellation i.e. at lower data rates (4, 16, 64, 256 QAM,PAM,PSK), but at higher constellation i.e. at 
1024 QAM,PAM,PSK Modulation scheme the BER obtained using MMSE and ZF are similar which is 
independent for given antenna configuration. Optimum BER and SER can also be achieved by increasing the 
number of transmitting and receiving antennas [7]. The gaps observed in the graph indicate a BER of zero i.e. 
the transmitted image was received without any errors. 

In Figure 4(a),(b),(c),(d),(e),(f) In the ascending order, I s ' (Lower most) black line, 2 nd black line, 3 rd 
black line, 4 th black line, 5 th (upper most) black line indicate PSNR observed employing MMSE detection 
scheme with 1024-QAM, PAM, PSK modulation ,with 256-QAM, PAM, PSK modulation , with 64-QAM, 
PAM, PSK modulation, with 16-QAM, PAM, PSK modulation and with 4-QAM, PAM, PSK modulation 
respectively. The 1 st (Lower most) blue line, 2 nd blue line, 3 ld blue line, 4 lh blue line, 5 th (upper most) blue line 
indicate PSNR observed employing ZF with 1024-QAM, PAM, PSK modulation, with 256 -QAM, PAM, PSK 
modulation, with 64-QAM, PAM, PSK modulation, with 16-QAM, PAM, PSK modulation , and with 4 -QAM , 
PAM, PSK modulation respectively 



I I.' ,' I , , .-,1ml - V.|! A' I -•-. In' 



EER Performance of Different V-EILAST Detection Schemes with M-qam 
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Figure 2: BER comparison between Zero-Forcing (ZF) (blue) and Minimum Mean Square Error 
(MMSE) (black) a) BER observed for a 4x4 antenna configuration with M-qam [13] b) BER observed for a 
12x12 antenna configuration with M-qam c) BER observed for a 4x4 antenna configuration with M-pam [15] d) 
BER observed for a 12x12 antenna configuration with M-pam e) BER observed for a 4x4 antenna configuration 
with M-psk [15] f) BER observed for a 12x12 antenna configuration with M-psk. 
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SER Performance of Different V-BLAST Detection Schemes with M-qam 
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Figure 3: SER comparison between Zero-Forcing (ZF) (blue) and Minimum Mean Square Error 
(MMSE) (black) a) SER observed for a 4x4 antenna configuration with M-qam [13] b) SER observed for a 
12x12 antenna configuration with M-qam c) SER observed for a 4x4 antenna configuration with M-pam d) SER 
observed for a 12x12 antenna configuration with M-pam e) SER observed for a 4x4 antenna configuration with 
M-psk f) SER observed for a 12x12 antenna configuration with M-psk. 
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Figure 4: PSNR Comparison between the Reconstructed Output and the Original Image Transmitted For Zero- 
Forcing (ZF) (blue) and Minimum Mean Square Error (MMSE) (black) (a) PSNR observed for a 4x4 antenna 
configuration with M-qam [13] (b) PSNR observed for a 12x12 antenna configuration with M-qam (c) PSNR 
observed for a 4x4 antenna configuration with M-pam [15] (d) PSNR observed for a 12x12 antenna 
configuration with M-pam (a) PSNR observed for a 4x4 antenna configuration with M-psk [15] (a) PSNR 
observed for a 12x12 antenna configuration with M-psk 
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Figure 5: (a) TOTAL FLOPS required for ZF and MMSE [13] (b) Time required for 4x4 antenna 
configuration [13] (c) Time required for 12x12 antenna configuration (d) Original Transmitted Image (e) 
Reconstructed Output Image of ZF algorithm at SNR=0 (f) Reconstructed Output Image of ZF algorithm at 
SNR=20 (g) Reconstructed Output Image of MMSE algorithm at SNR=0 (h) Reconstructed Output Image of 
MMSE algorithm at SNR=20. 

From figure 4 we observe the difference in the quality of the image reconstructed at the 
receiver when compared to the original image that was transmitted. The Quality of the Image Reconstructed i.e. 
the PSNR is higher at lower constellation i.e. at lower data rates (4, 16, 64, 256 QAM, PAM, PSK), but at higher 
constellation i.e. at 1024 QAM , PAM, PSK Modulation scheme the PSNR obtained using MMSE and ZF are 
similar. Improvement in PSNR is also observed when the number of transmitting and receiving antennas is 
increased [7]. The gaps observed in the graph indicate a PSNR of infinity. 
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From Figure 5(a) we observe the Number of Floating Point Operations (FLOPS) required for MMSE 
and ZF increases monotonically for Number of Transmit and Receiving antennas from 1 to 10, above which 
further increase in Number of Floating Point operations is observed for MMSE when compared to ZF. (One 
complex multiplication and addition requires six and two flops respectively). From figure 5(b), (c) compares the 
time required for detection for ZF and MMSE detection algorithms for 4x4 and 12x12 array respectively. The 
Time required is directly related to the Number of flops required for execution. 

Figure 5:(d) is the original transmitted Image, Figure 5 (e),(f),(g),(h) are the Reconstructed Image at the 
receiver for SNR= and 20 for MMSE and ZF algorithm. The Quality of the image is directly related to the 
BER observed, since the BER of MMSE outperforms ZF the quality of image obtained using MMSE is higher 
when compared to ZF. 

B. An Efficient Square-Root Algorithm 

The main computational bottleneck in the Basic BLAST detection algorithm is the "nulling and 
cancelation" step, where the optimal ordering for the sequential estimation and detection of the received signal 
is determined. An Efficient Square-Root Algorithm [8] for BLAST reduces the computational cost for the 
nulling and cancellation step. The algorithm is numerically stable since it is division free and uses only 
Orthogonal transformations such as Householders transformation or sequence of Givens Rotation[9][10] .The 
numerical stability of the algorithm also makes it attractive for implementation in fixed-point rather than 
floating-point, architectures. 

Initialization: 

a) Let m=M. Compute square root of P, i.e.,P 1/2 and Q„ 
Form the so called (M+N+l) x (M+l) pre array 



On= 



1 


HP/ 


M 


pl/2 
r i-l 


-ei 


Bi 



1/2 



and propagate the pre-array N times: 

x 



x 

L X 



T M 

pl7"2 
r i-\ 
Bi 



(16) 



where e; is an Nxl vector of all-zeros except for the i-th entry which is unity, P is the square root of 
an MxM linear transform matrix P for MMSE is given by 

P = (H H H+I M )"'H H (17) 

B; is an NxM sub-matrix of 'O; and B N = Q a , "x"denotes not relevant entries at this time, and A t is any 
unitary transformation that block lower triangularize the pre-array 'Qi 



Iterative Detection: 

b) Find the minimum length row of P 1/2 and permute it to be the last M-th row. 



x 10 8 FLOPS VS NUMBER OF TRANSMIT/RECEIVE ANTENNAS 



Time required for each SNR 



-Basic V-BLAST Algorithm with ZF 
-Basic V-BLAST Algorithm with MMSE 
Effecient Square-root Algorithm 




NUMBER OF TRANSMIT/RECEIVE ANTENNAS 

(a) 




(b) 
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Time required for each SNR 



- zero forcing algorithm 

- mmse algorithm 
-square root algorithm 




(c) 
Figure 6: (a) TOTAL FLOPS required for ZF and MMSE and an efficient square root algorithm 
employing MMSE [13] (b) Time required for detection for 4x4 antenna configuration with efficient square root 
algorithm [13] (c) Time required for detection for 12x12 antenna configuration with efficient square root 
algorithm (Black: conventional detection with MMSE; red conventional detection with ZF; blue: efficient 
square root algorithm employing MMSE) 
c) Find a unitary transformation X such that P 1/2 X is block Upper triangular 



pl/2 2: 



rD (M-i)/2 



T M- 



,(M-l)/2 

1 M 
D l/2 



(18) 



Where P, 



(M-l)/2 



M 



1/2 

and P M denote the last (M-l) xl sub-column and the (M,M)-th scalar entry, respectively. 



d) Update Q a to Q a 2: 

e) Form the linear MMSE estimate of a„ 



« -U 1 / 2 n H ( m ). 
a m— r M Ha,M 



(19) 



Where q a M is the M-th column of Q a . 

f) Obtain a m from a m via slicing. 

g) Cancel the interference of a m in r (m) to obtain the reduced- order problem by 

i (M - l> = r {M> -h M aM (20) 

h) If m >1, let m = m - 1 and go back to step b. With the corresponding r (m " 1 ), H m_1 , P (M1)/2 and Q^' 1 instead 
ofP 1/2 andQ a . 



/. Simulation results 

Simulation is performed using the parameters from Table I. The performance parameters such as BER, 
SER and PSNR are similar to the results obtained for conventional detection scheme employing MMSE. Figure 
6(a) compares the Number of FLOPS required for the conventional Detection scheme employing MMSE and 
ZF and An Efficient Square-Root Algorithm for BLAST employing MMSE (one complex multiplication and 
addition requires six and two flops respectively). The efficient square root algorithm outperforms the 
conventional detection scheme in terms of Number FLOPS required for detecting the received symbols. 
Admit of unitary or orthogonal transformation such as householder or givens rotation for detection reduced the 
computational cost for detection while sustaining the performance obtained with the conventional detection 
scheme employing MMSE. 

Figure 6 (b) (c) compares the time required for detection between conventional detection employing 
ZF, MMSE and an efficient square root algorithm employing MMSE for 4x4 and 12x12. Due to the achieved 
reduction in the Number of floating point operation, reduction in the time required for detection is observed 
when efficient square root algorithm is employed with MMSE. 

C. An Improved Square Root Algorithm For V-Blast 

The previous algorithm An Efficient Square-Root Algorithm for BLAST algorithm [8] computes the 
whole nulling matrices Q™ for each deflated sub-channel matrix, while only one column of each is used (i.e., the 
optimum nulling vector); the intermediate results P^ computed in the algorithm are discarded without any 

usage. Thus An Improved Square Root Algorithm [11] for BLAST find's the optimum nulling vectors with the 
help of P^ , avoiding the computation of Q™ . At the same time, the robustness of the improved square-root 

algorithm is maintained without any inverse or squaring operation. 
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Initialization: 

a) Let m = M . To Compute an initial F = F 

al)SetP 1/z =(l/Va)lM 



jute LT i = 




1 hfP 1 /, 2 , 






n P 1/2 

Um r j_j 




Ui&i = 


x OT ~ 




X P / 


Iteratively for i = 


1, 


2, • • • , N. 





and 



(21) 



Where "x" denotes irrelevant entries at this time and ©i is any unitary transformations that block lower triangularize the 
pre-array Hi. 

Finally F = P ^ is the square root of P where P= (H H H+ art) 1 

Iterative Detection: 

b) Find the minimum length row of F m and permute it to the last row. Permute H m accordingly 
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Figure 7: (a) TOTAL FLOPS required for conventional detection scheme with ZF and MMSE, an efficient square 
root algorithm employing MMSE, an improved square root algorithm with MMSE [14] (b) Time required for detection for 
4x4 antenna configuration [14] (c) Time required for detection with 12x12 antenna configuration (Black: conventional 
detection with MMSE; red conventional detection with ZF; blue: efficient square root algorithm employing MMSE; 
magenta: an improved square root algorithm with MMSE) 

c) Block upper-triangularize F m by 

F,„ X - 

(22) 
Where 2 is a unitary transformation, u m -i is an (m-1) x 1 column vector, and X m is a scalar. 

d) Form the linear MMSE estimate of a m , 

a m =Xm[ u^ (A m y]KM (23) 

e) Obtain a m from a ra via slicing. 

f) Cancel the interference of a m in r m> to obtain the reduced-order problem by 

r <M-D = r (M>-h M (24) 

g) If m > 1, let m - m - 1 and go back to step P2. With the corresponding r (m_1) , a" 1 " 1 , H m "' and F m ~'. 

1. Simulation results 

Simulation is performed using the parameters from Table I. The performance parameters such as BER, 
SER and PSNR are similar to the results obtained for conventional detection scheme employing MMSE. Figure 
7(a) compares the Number of FLOPS required for the conventional Detection scheme employing MMSE and 
ZF, An Efficient Square-Root Algorithm for BLAST employing MMSE and An Improved Square Root 
Algorithm for BLAST employing MMSE (one complex multiplication and addition requires six and two flops 
respectively). since the Improved Square Root Algorithm for BLAST employs unitary transformation and 
utilizes intermediate results for detection the algorithm outperforms the efficient square root algorithm in terms 
of Number FLOPS required for detecting the received symbols. From figure7(b),(c) compares the time required 
for detection for conventional Detection scheme employing MMSE and ZF, An Efficient Square-Root 
Algorithm for BLAST employing MMSE and An Improved Square Root Algorithm for BLAST employing 
MMSE with 4x4 and 12x12 array. The Time required is directly related to the Number of flops required for 
execution. 

D. An Improved Square-Root Algorithm For V-BLAST Based On Efficient Inverse Cholesky 
Factorization 

Further reduction in the number of FLOPS is achieved by employing a fast algorithm for inverse 
Cholesky factorization used to compute a triangular square-root of the estimation error covariance matrix, it is 
then applied to propose an improved square-root algorithm for V-BLAST, which speedups several steps in the 
previous one and can offer further computational savings in MIMO Orthogonal Frequency Division 
Multiplexing (OFDM) systems. Compared to the conventional inverse Cholesky factorization, the proposed one 
avoids the back substitution (of the Cholesky factor), and then requires only half divisions. The algorithm is 
faster than the existing efficient V-BLAST algorithms. [12] 

Initialization: 

a) Set m = M. Compute Rm, Z m and the initial upper triangular F = Fm. This step includes in the sub-steps Nl-a, Nl-b, 
Nl-c and Nl-d. 
Nl-a) Assume the successive detection order to be [£m, tti.u • • • , tj . Correspondingly permute H to be H = Hm = 
[hti, ht 2 , • • •, hm]. 
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Nl-b) Utilize the permuted H to compute Rm, where we can obtain all Rm-i s , Vm_i s and /?m s (for m = M, M -1, • • • , 
2), as shown 



R„ 



Where R = H"H+ aim. 



Nl-c) Compute Ft by Ft =yjR^ Then use 



R 



-m-1 v m-l 



m-1 Pm 



(25) 



hm ~ 1/ JPm — V m -lF m _i F m _ 1 V m _ 1 

Um-i = -ImFm-i F^.jVm.i and 



F„, i u„ 



To compute F m from Fm-i iteratively for m = 2, 3, ■ ■ ■ , to obtain the Initial F = Fm. 
Nl-d) Compute zm = H^x^ = H$ x. (27) 

„ ,n s FLOPS VS NUMBER OF TRANSMIT/RECEIVE ANTENNAS 



(26) 



Time required for each SNR 



- Basic V-BLAST Algorithm with IF 

- Basic V-BLAST Algorithm with MMSE 
■Effecient Square-root Algorithm 
-Improved Square-root Algorithm (ISR) 

■ ISR Algorithm based on cholesky Factorization 




A 6 S 1D 12 14 1 

NUMBER OF TRANSMIT/RECEIVE ANTENNAS 



(a) (b) 



Time required for each SNR 




(c) 
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Figure 8: TOTAL FLOPS required for conventional detection scheme with ZF and MMSE, an efficient square root 
algorithm employing MMSE, an improved square root algorithm with MMSE and an improved square root algorithm based 
on cholesky factorization with MMSE [14]. (b) Time required for detection for 4x4 antenna configuration [14] (c) Time 
required for detection with 12x12 antenna configuration (Black: conventional detection with MMSE; red conventional 
detection with ZF; blue: efficient square root algorithm employing MMSE; magenta: an improved square root algorithm with 
MMSE and cyan: an improved square root algorithm based on cholesky factorization with MMSE). 

Iterative Detection: 

b) Find the minimum length row in F m and permute it to be the last m-th row. Correspondingly permute z m , and rows and 
columns in R. 

c) Block upper-triangularize F m by 



-m-l u m-l 



F m 2 

(28) 

Where 2 is a unitary transformation, u m _i is an (m-l) x 1 column vector, and Am is a scalar, 
d) Form the least-mean-square estimate a m by 



1,, = Am [(Um-l) (Am)*] Zm (29) 



e) Obtain a m from a ra via slicing 

f) Cancel the effect of a m in z m by 



Zm-l = Z m '— a.mVm-1 (30) 



Where z^ is the permuted z m with the last entry removed, and v m -i is in the permuted R m 

g) If m >1, let m = m - 1 and go back to step N2 with the corresponding Zm_i, a m .i, Rm.i and F m .i. 

/. Simulation results 

Simulation is performed using the parameters from Table I. The performance parameters such as BER, 
SER and PSNR are similar to the results obtained for conventional detection scheme employing MMSE. 

Figure 8 (a) compares the Number of FLOPS required for the conventional Detection scheme 
employing MMSE and ZF, An Efficient Square-Root Algorithm for V-BLAST employing MMSE, An 
Improved Square Root Algorithm for V-BLAST employing MMSE and An Improved Square-Root Algorithm 
for V-BLAST Based on Efficient Inverse Cholesky Factorization employing MMSE (one complex 
multiplication and addition require six and two flops respectively). An Improved Square-Root Algorithm for V- 
BLAST Based on Efficient Inverse Cholesky Factorization outperforms all the above mentioned Algorithms and 
is faster than the existing efficient V-BLAST algorithms. 

Figure 8 (b),(c) compares the time required for detection Due to the achieved reduction in the Number of 
floating point operation, reduction in the time required for detection is observed when Improved Square-Root 
Algorithm for V-BLAST Based on Efficient Inverse Cholesky Factorization is employed with MMSE. 

IV. CONCLUSION 

This paper provides a detailed comparison of various detection schemes employed in V-BLAST 
systems with modulation schemes such as M-QAM, M-PAM, M-PSK for 4x4 and 12x12 array. Parameters 
considered include BER, SER, PSNR and FLOPS. Simulation results show optimum performance for V- 
BLAST is achieved with M-QAM modulation scheme, whereas MMSE outperforms ZF in terms of BER SER 
and PSNR at the cost of increase in number of FLOPS. The number of FLOPS required for MMSE at 16 
transmitting and 16 receiving antennas is 6.2xl0 6 , FLOPS of 5.8xl0 6 for ZF is observed. Reduction in the 
number of FLOPS required for detection is accomplished by efficient square root algorithm with MMSE which 
employs unitary transformation to avoid squaring and matrix inversion operation, the number of FLOPS 
required with 16 transmitting and 16 receiving antennas is 4.8xl0 6 , A reduction of lxlO 6 FLOPS and 1.4xl0 6 
FLOPS is achieved compared to conventional detection scheme employing ZF and MMSE respectively. 
Reduction in the number of FLOPS compared to efficient square root algorithm was accomplished by the 
improved square root algorithm which utilizes intermediate results that were discarded without any usage in the 
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efficient square root algorithm. The number of FLOPS required for detection with improved square root 
algorithm employing MMSE for 16 transmitting and 16 receiving antennas is 3.5xl0 6 , a reduction of 1.3xl0 6 
FLOPS is achieved compared to the efficient square root algorithm with MMSE and reduction 2.3xl0 6 , 2.7xl0 6 
FLOPS is achieved compared to conventional detection scheme employing ZF, with MMSE respectively. 
Further reduction in number of FLOPS is achieved by employing a fast algorithm to compute a triangular square 
root of the estimation error covariance matrix. The number of FLOPS required improved square root algorithm 
based on efficient inverse cholesky factorization with MMSE for 16 transmitting and 16 receiving antennas is 
0.6 x 10 6 , a reduction of 2.9xl0 6 ' 4.2xl0 6 , 5.2xl0 6 and 5.6xl0 6 is achieved compared to improved square-root 
algorithm, the efficient square root algorithm and the conventional detection scheme employing ZF,MMSE 
respectively. The algorithm is faster than the existing efficient V-BLAST algorithms. 
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